Data Extraction Based on Page Structure Analysis
نویسندگان
چکیده
منابع مشابه
A Multi-Page Data Extraction Service
We present a service-oriented architecture and a set of techniques for developing wrapper code generators, including the methodology of designing an effective wrapper program construction facility and a concrete implementation, called XWRAPComposer. Our wrapper generation framework has two unique design goals. First, we explicitly separate tasks of building wrappers that are specific to a Web s...
متن کاملData Extraction using Content-Based Handles
In this paper, we present an approach and a visual tool, called HWrap (Handle Based Wrapper), for creating web wrappers to extract data records from web pages. In our approach, we mainly rely on the visible page content to identify data regions on a web page. In our extraction algorithm, we inspired by the way a human user scans the page content for specific data. In particular, we use text fea...
متن کاملSeismic structure extraction based on multi-scale sensitivity analysis
The exploration of geological composition, e.g. underground flow path, is a significant step for oil and gas search. However, to extract the structural geological composition from the volume, neither the classic volume exploration methods, e.g. transfer function design, nor the traditional volume cut algorithms can be directly used due to its three natural properties, various compositions, disc...
متن کاملHTML Page Analysis Based on Visual Cues
In this paper, we present a novel approach to automatically analyzing semantic structure of HTML pages based on detecting visual similarities of content objects on web pages. The approach is developed based on the observation that in most web pages, layout styles of subtitles or records of the same content category are consistent and there are apparent separation boundaries between different ca...
متن کاملChapter 3 . 24 XWRAPComposer : A Multi - Page Data Extraction Service
We present a service-oriented architecture and a set of techniques for developing wrapper code generators, including the methodology of designing an effective wrapper program construction facility and a concrete implementation, called XWRAPComposer. Our wrapper generation framework has two unique design goals. First, we explicitly separate tasks of building wrappers that are specific to a Web s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: MATEC Web of Conferences
سال: 2017
ISSN: 2261-236X
DOI: 10.1051/matecconf/201713900118